Distributed RDF Triple Store Using HBase and Hive

نویسندگان

  • Albert Haque
  • Lynette Perkins
چکیده

The growth of web data has presented new challenges regarding the ability to effectively query RDF data. Traditional relational database systems efficiently scale and query distributed data. With the development of Hadoop its implementation of the MapReduce Framework along with HBase, a NoSQL data store, the semantics of processing and querying data has changed. Given the existing structure of SPARQL, we devised and are continuing to develop a means to query RDF data effectively. By using HBase as a data store and HiveQL as the query generator, we implement a prototype intermediate translator which will generate HiveQL given a SPARQL query. This HiveQL query will then return data from our distributed HBase store.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Jena-HBase: A Distributed, Scalable and Effcient RDF Triple Store

Abstract. Lack of scalability is one of the most significant problems faced by single machine RDF data stores. The advent of Cloud Computing has paved a way for a distributed ecosystem of RDF triple stores that can potentially allow up to a planet scale storage along with distributed query processing capabilities. Towards this end, we present Jena-HBase, a HBase backed triple store that can be ...

متن کامل

A MapReduce Approach to NoSQL RDF Databases

In recent years, the increased need to house and process large volumes of data has prompted the need for distributed storage and querying systems. The growth of machine-readable RDF triples has prompted both industry and academia to develop new database systems, called “NoSQL,” with characteristics that differ from classical databases. Many of these systems compromise ACID properties for increa...

متن کامل

Data Mining over Large Datasets Using Hadoop in Cloud Environment

There is a drastic growth of data’s in the web applications and social networking and such data’s are said be as Big Data. The Hive queries with the integration of Hadoop are used to generate the report analysis for thousands of datasets. It requires huge amount of time consumption to retrieve those datasets. It lacks in performance analysis. To overcome this problem the Market Basket Analysis ...

متن کامل

How to feed Apache HBase with Petabytes of RDF Data: An Extremely Scalable RDF Store Based on Eclipse RDF4J Framework and Apache HBase Database

Majority of the systems designed to handle big RDF data rely on a single high-end computer dedicated to a certain RDF dataset and do not easily scale out, at the same time several clustered solution were tested and both the features and the benchmark results were unsatisfying. In this paper we describe a system designed to tackle such issues, a system that connects RDF4J and Apache HBase in ord...

متن کامل

H2RDF+: High-performance distributed joins over large-scale RDF graphs

The proliferation of data in RDF format calls for efficient and scalable solutions for their management. While scalability in the era of big data is a hard requirement, modern systems fail to adapt based on the complexity of the query. Current approaches do not scale well when faced with substantially complex, non-selective joins, resulting in exponential growth of execution times. In this work...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012